Sparse Suffix Tree Construction in Optimal Time and Space

نویسندگان

  • Pawel Gawrychowski
  • Tomasz Kociumaka
چکیده

Suffix tree (and the closely related suffix array) are fundamental structures capturing all substrings of a given text essentially by storing all its suffixes in the lexicographical order. In some applications, such as sparse text indexing, we work with a subset of b interesting suffixes, which are stored in the so-called sparse suffix tree. Because the size of this structure is Θ(b), it is natural to seek a construction algorithm using only O(b) words of space assuming read-only random access to the text. We design a linear-time Monte Carlo algorithm for this problem, hence resolving an open question explicitly stated by Bille et al. [TALG 2016]. The best previously known algorithm by I et al. [STACS 2014] works in O(n log b) time. As opposed to previous solutions, which were based on the divide-and-conquer paradigm, our solution proceeds in n/b rounds. In the r-th round, we consider all suffixes starting at positions congruent to r modulo n/b. By maintaining rolling hashes, we can lexicographically sort all interesting suffixes starting at such positions, and then we can merge them with the already considered suffixes. For efficient merging, we also need to answer LCE queries efficiently (and in small space). By plugging in the structure of Bille et al. [CPM 2015] we obtain O(n + b log b) time complexity. We improve this structure by a recursive application of the so-called difference covers, which then implies a linear-time sparse suffix tree construction algorithm. We complement our Monte Carlo algorithm with a deterministic verification procedure. The verification takes O(n√log b) time, which improves upon the bound of O(n log b) obtained by I et al. [STACS 2014]. This is obtained by first observing that the pruning done inside the previous solution has a rather clean description using the notion of graph spanners with small multiplicative stretch. Then, we are able to decrease the verification time by applying difference covers twice. Combined with the Monte Carlo algorithm, this gives us an O(n√log b)-time and O(b)-space Las Vegas algorithm. Work done while the author held a post-doctoral position at Warsaw Center of Mathematics and Computer Science. Supported by Polish budget funds for science in 2013-2017 as a research project under the ‘Diamond Grant’

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparse Suffix Tree Construction with Small Space

We consider the problem of constructing a sparse suffix tree (or suffix array) for b suffixes of a given text T of size n, using only O(b) words of space during construction time. Breaking the naive bound of Ω(nb) time for this problem has occupied many algorithmic researchers since a different structure, the (evenly spaced) sparse suffix tree, was introduced by Kärkkäinen and Ukkonen in 1996. ...

متن کامل

Optimal Substring-Equality Queries with Applications to Sparse Text Indexing

We consider the problem of encoding a string of length n from an alphabet [0, σ − 1] so that access and substring-equality queries (that is, determining the equality of any two substrings) can be answered efficiently. A clear lower bound on the size of any prefix-free encoding of this kind is n log σ + Θ(log(nσ)) bits. We describe a new encoding matching this lower bound when σ ≤ nO(1) while su...

متن کامل

On-Line Linear-Time Construction of Word Suffix Trees

Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Sparse suffix trees are kind of suffix trees that represent only a subset of suffixes of the input string. In this paper we study word suffix trees, which are one variation of sparse suffix trees. Let D be a dictionary of words and w be a string i...

متن کامل

Sparse compact directed acyclic word graphs

The suffix tree of string w represents all suffixes of w, and thus it supports full indexing of w for exact pattern matching. On the other hand, a sparse suffix tree of w represents only a subset of the suffixes of w, and therefore it supports sparse indexing of w. There has been a wide range of applications of sparse suffix trees, e.g., natural language processing and biological sequence analy...

متن کامل

Faster Dynamic Compact Tries with Applications to Sparse Suffix Tree Construction and Other String Problems

The dynamic compact trie is a fundamental data structure for a wide range of string processing problems. Jansson, Sadakane, and Sung (LNCS 4855, pp.424-435, FSTTCS 2007) presented the dynamic uncompacted trie data structure of n nodes in O(n log σ) space supporting pattern matching in O((|P |/α)f(n)) time and insert/delete operations in O(f(n)) time, where f(n) = ((log logn)/log log logn) is th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017